25 research outputs found

    Exploiting Data Skew for Improved Query Performance

    Full text link
    Analytic queries enable sophisticated large-scale data analysis within many commercial, scientific and medical domains today. Data skew is a ubiquitous feature of these real-world domains. In a retail database, some products are typically much more popular than others. In a text database, word frequencies follow a Zipf distribution with a small number of very common words, and a long tail of infrequent words. In a geographic database, some regions have much higher populations (and data measurements) than others. Current systems do not make the most of caches for exploiting skew. In particular, a whole cache line may remain cache resident even though only a small part of the cache line corresponds to a popular data item. In this paper, we propose a novel index structure for repositioning data items to concentrate popular items into the same cache lines. The net result is better spatial locality, and better utilization of limited cache resources. We develop a theoretical model for analyzing the cache behavior, and implement database operators that are efficient in the presence of skew. Our experiments on real and synthetic data show that exploiting skew can significantly improve in-memory query performance. In some cases, our techniques can speed up queries by over an order of magnitude

    Conditionally Risk-Averse Contextual Bandits

    Full text link
    Contextual bandits with average-case statistical guarantees are inadequate in risk-averse situations because they might trade off degraded worst-case behaviour for better average performance. Designing a risk-averse contextual bandit is challenging because exploration is necessary but risk-aversion is sensitive to the entire distribution of rewards; nonetheless we exhibit the first risk-averse contextual bandit algorithm with an online regret guarantee. We conduct experiments from diverse scenarios where worst-case outcomes should be avoided, from dynamic pricing, inventory management, and self-tuning software; including a production exascale data processing system

    Evaluating multi-way joins over discounted hitting time

    No full text
    The prevalence of graphs in emerging applications has recently raised a lot of research interests. To acquire interesting information hidden in large graphs, tasks including link prediction, collaborative recommendation, and reputation ranking, all make use of proximities between graph nodes. The discounted hitting time (DHT), which is a random-walk similarity measure for graph node pairs, has shown to be useful in various applications. In this thesis, we examine a novel query, called the multi-way join (or n-way join), over DHT scores. Given a graph and n sets of nodes, the n-way join retrieves a ranked list of n-tuples with the k highest scores, according to some aggregation function of DHT values. By extracting such top-k results, this query enables the analysis and prediction of various complex relationships among n sets of nodes on a large graph. Since an n-way join is expensive to evaluate, we develop the Partial Join algorithm (or PJ). This solution decomposes an n-way join into a number of top-m 2-way joins, and combines their results to construct the answer of the n-way join. Since the process of PJ may necessitate the computation of top-(m + 1) 2-way joins, we study an incremental solution, which saves the trouble of recomputation and allows the results of top-(m+1) 2-way join to be derived quickly from the top-m 2-way join results earlier computed. For better performance, we further examine efficient processing algorithms and pruning techniques for 2-way joins. Through extensive experiments on three real graph datasets, we show that the proposed PJ algorithm accurately evaluates n-way joins, and is four orders of magnitude faster than basic solutions.published_or_final_versionComputer ScienceMasterMaster of Philosoph

    Evaluating Multi-Way Joins over Discounted Hitting Time

    Get PDF
    Abstract—The discounted hitting time (DHT), which is a random-walk similarity measure for graph node pairs, is useful in various applications, including link prediction, collaborative recommendation, and reputation ranking. We examine a novel query, called the multi-way join (or n-way join), on DHT scores. Given a graph and n sets of nodes, the n-way join retrieves a set of n-tuples with the k highest scores, according to some aggregation function of DHT values. This query enables analysis and prediction of complex relationship among n sets of nodes. Since an n-way join is expensive to compute, we develop the Partial Join algorithm (or PJ). This solution decomposes an n-way join into a number of top-m 2-way joins, and combines their results to construct the answer of the n-way join. Since PJ may necessitate the computation of top-(m + 1) 2-way joins, we study an incremental solution, which allows the top-(m + 1) 2-way join to be derived quickly from the top-m 2-way join results earlier computed. We further examine fast processing and pruning algorithms for 2-way joins. An extensive evaluation on three real datasets shows that PJ accurately evaluates n-way joins, and is four orders of magnitude faster than basic solutions. I

    Supplementary Online Material

    No full text
    NiO thickness dependence of the exchange bias field,The derivation the NiO thickness dependence of SMR, MR measurement for control sample, and harmonic Hall measurement

    Realization of Multi‐Level State and Artificial Synapses Function in Stacked (Ta/CoFeB/MgO)N Structures

    No full text
    Abstract Spintronic devices can realize multi‐state storage and be used to simulate artificial synapses or artificial neurons, which makes them have promising application prospect in the field of artificial neural networks (ANN). This work investigates the current‐induced magnetization reversal in stacked (Ta/CoFeB/MgO)N structures and their application in ANN. It is demonstrated that the complete current‐induced magnetization reversal with large intermediate transition region can be achieved in the sample with N = 2. The magneto‐optical Kerr microscope imaging shows that the large transition region for the sample is ascribed to the “layer‐by‐layer” reversal, owing to the difference of the coercivity of two CoFeB layers. In addition, the simulation of artificial synapses and artificial neurons function based on current‐induced magnetization reversal in the sample is also demonstrated. These results substantiate the stacked (Ta/CoFeB/MgO)N structures as a promising platform for realizing the multi‐level state and artificial synapses function, and its potential application in the field of ANN

    Predictive deployment of UAV base stations in wireless networks:machine learning meets contract theory

    No full text
    Abstract In this paper, a novel framework is proposed to enable a predictive deployment of unmanned aerial vehicles (UAVs) as temporary base stations (BSs) to complement ground cellular systems in face of downlink traffic overload. First, a novel learning approach, based on the weighted expectation maximization (WEM) algorithm, is proposed to estimate the user distribution and the downlink traffic demand. Next, to guarantee a truthful information exchange between the BS and UAVs, using the framework of contract theory, an offload contract is developed, and the sufficient and necessary conditions for having a feasible contract are analytically derived. Subsequently, an optimization problem is formulated to deploy an optimal UAV onto the hotspot area in a way that the utility of the overloaded BS is maximized. Simulation results show that the proposed WEM approach yields a prediction error of around 10%. Compared with the expectation maximization and k-mean approaches, the WEM method shows a significant advantage on the prediction accuracy, as the traffic load in the cellular system becomes spatially uneven. Furthermore, compared with two event-driven deployment schemes based on the closest-distance and maximal-energy metrics, the proposed predictive approach enables UAV operators to provide efficient communication service for hotspot users in terms of the downlink capacity, energy consumption and service delay. Simulation results also show that the proposed method significantly improves the revenues of both the BS and UAV networks, compared with two baseline schemes
    corecore